**Parallel Computing of Graph-based Functions in Re-RAM**

Due to the constant decrease of feature size, CMOS is approaching its physical limits, necessitating the search for potential future technologies beyond the scaling limit. Resistive Random Access Memory (ReRAM) is a non-volatile memory device with low power consumption, built-in computing capabilities, and high logic synthesis efficiency.

The Multiply-Accumulate (MAC) operation is used instead of logic primitives in the Binary Decision Diagram (BDD) method. The BDD nodes are immediately mapped onto parallel MAC operations.

The And-Inverter Graph (AIG) is an in-memory computer architecture-based automated compilation technique. It can convert any Boolean function.

The edges of graph-based representations indicate wires between two-input AND gates that correspond to nodes, which can be supplemented to represent inverters between nodes.

Multiple MAC procedures are performed in simultaneously using ReRAM MAC Computation.

MIGs are the state-of-the-art graph structure for ReRAM-based synthesis in Graph Based Computation since they need the fewest operations and devices.

A computation is wordline parallel if it utilizes one wordline and multiple bitlines, bitline parallel if it utilizes one bitline and multiple wordlines, and mixed parallel if it utilizes wordlines and multiple bitlines in parallelism.

Every node in BDD-based Parallel Computation must be implemented as a 2x1 multiplexer.

All offspring of both nodes must be calculated in AIG-based Parallel Computation. There must be no data dependencies between the nodes, and they must share a wordline operand, with host devices in the same wordline, and the content must not be required for any other calculations.

Each node depicts an m-Input And Gate or a PI in M-AIG-based Parallel Computation; each input edge can be linked to the constant 1 or to a child node, and if the input edge is linked to a node, the edge can be complemented to signal inversion.

The proposed method drastically decreases the number of devices and processes that are required. On average, it decreases the number of processes by 66 percent. In both area and operation, BDD and AIG excel. m-AIG has outperformed AIG for smaller values of m.

**Power Aware Computing**

Power, energy, and temperature all have an impact on processor designs. Future energy efficiency will be improved by CPU clock frequency stagnation and dependence on parallelism. In addition to hardware, software design may make a tremendous difference. The ability to measure power and energy usage is a must-have feature. In this study, the PAPI library was employed, which provides a generic and portable access to hardware counters connected to the CPU and other components. Experiments were conducted using Intel's Xeon Phi Knights Landing (KNL) processor architecture. Dense Linear Algebra (DLA) Kernels were used in the analysis (BLAS kernels).

To examine and assess the effect of application of power consumption and energy needs, kernels found in high performance computing applications were chosen. Scalar and vector operations are addressed at level 1, matrix-vector operations are addressed at level 2, and matrix-matrix operations are addressed at level 3. Memory Bond Levels 1 and 2 are in the Memory Bond Class, whereas Compute Intensive Level 3 is in the Compute Intensive Class.

The investigation and analysis of the compute expensive routine dgemm and memory bound class dgemv were presented.

The PAPI library provides a consistent mechanism for collecting performance counter data from various hardware and software components. PAPI includes a number of components that allow for the monitoring of power usage and consumption via various interfaces.

The FLAT mode does not use the MCDRAM as a cache, but as physical addressable memory space, according to dgemm Kernel behavior.

The speed of the dgemv Kernel reduces between the two storages, and the results are the same as in Hybrid Mode, except for DDR4, which performs four times slower than MCDRAM.

This research reveals that using high bandwidth MCDRAM on KNL is critical for high efficiency and low power consumption, and that if the application is computation heavy, Hybrid mode is the best option.

**Temperature-Aware Computer Systems Opportunities and Challenges**

Designing with power in mind has not been enough to halt the flow of issues like heat density. The rate of localized heating is substantially quicker than the rate of chip-wide heating. Typical high-power applications are still 20% or more below the worst-case scenario.

The necessity for Architectural-Level Thermal Management originates from the fact that each computing system's architecture domain is distinct, and workload development is necessary to regulate instruction level parallelism. For the computer system, the design handbook includes hot spots and temperature gradients. The role of system design and operating system is critical in this circumstance.

Thermal modelling at the design level is required to eliminate temporal and spatial nonuniformities in computation caused by thermal impacts.

A computing system's compact model of a parametric microarchitecture must track temperatures at the granularity of individual microarchitectural units, be modelled so that a new compact model for various microarchitectures can be created, be able to solve the RC circuit's differential equations quickly and be boundary and initial-condition independent.

Temperature against average power density for gcc with a power averaging interval of 0.033 seconds is provided for Temperature-tracking Dynamic Frequency Scaling.

The temperature dependency of carrier mobility in CMOS means that frequency is also linearly dependent on the operating temperature in Temperature-tracking Dynamic Frequency Scaling. When applications reach the temperature limit, they can simply reduce the frequency to compensate for the increased temperature.

Dynamic Voltage Scaling is a thermal management approach. Because circuits switch more slowly as the operating voltage approaches the threshold voltage, a processor must drop frequency in tandem with voltage when reducing the processor voltage.

According to the findings, Migrating Computation is the best DTM technique at 0.8 K/W because the floorplan alone is sufficient to lower the operating temperature of the primary integer register file, MC can use ILP to hide the extra latency of the spare register file, and complete elimination of activity in the primary register file allows it to cool quickly, reducing the use of the slower secondary register file.